Zipf 's law in human heartbeat dynamics 
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It is shown that the distribution of low variability periods in the activity of human heart rate 
typically follows a multi-scaling Zipf's law. The presence or failure of a power law, as well as the 
values of the scaling exponents, are personal characteristics depending on the daily habits of the 
subjects. Meanwhile, the distribution function of the low- variability periods as a whole discriminates 
efficiently between various heart pathologies. This new technique is also applicable to other non- 
linear time-series and reflects these aspects of the underlying intermittent dynamics, which are not 
covered by other methods of linear- and nonlinear analysis. 

PACS numbers: PACS numbers: 87.10.+e, 87.80.Tq, 05.45.-a, 05.45.Tp 



The nonlinear and scale-invariant aspects of the heart 
rate variability (HRV) have been studied intensively dur- 
ing the last decades. This continuous interest to the HRV 
can be attributed to the controversial state of affairs: on 
the one hand, the nonlinear and scale-invariant analysis 
of HRV has resulted in many methods of very high prog- 
nostic performance (at least on test groups) Q 0j B §5 
on the other hand, practical medicine is still confident to 
the traditional "linear" methods. The situation is quite 
different from what has been observed three decades ago, 
when the "linear" measures of HRV became widely used 
as important noninvasive diagnostic and prognostic tools, 
soon after the pioneering paper Q . Apparently, there is 
a need for further evidences for the superiority of new 
methods and for the resolution of the existing ambigui- 
ties. 

During recent years the main attention of studies has 
been focused on the analysis of the scale-invariant meth- 
ods. It has been argued that measures related to a 
certain time-scale (e.g. 5 min) are less reliable, because 
the characteristic time-scales of physiological processes 
are patient-specific. The scale-invariant measures are of- 
ten believed to be more universal and sensitive to life- 
threatening pathologies j|, [|. However, carefully de- 
signed time-scale-related measures can be also highly 
successful, because certain physiological processes are re- 
lated to a specific time scale ||. 

The scale invariance has been exclusively seen in the 
heart rhythm following the (multi)fractional Brownian 
motion (fBm) || . It has been understood that the heart 
rhythm fluctuates in a very complex manner and reflects 
the activities of the subject (sleeping, watching TV, walk- 
ing etc.) 0, D an d cannot be adequately described by 
a single Hurst exponent of a simple fBm. In order to 
reflect the complex behavior of the heart rhythm, the 
multi-affine generalization of the fBm has been invoked 
[EL El; it has been claimed that the multifractal scaling 



exponents are of a significant prognostic value. 

The approach based on fBm addresses long-time dy- 
namics of the heart rhythm while completely neglecting 
the short-scale dynamics on time scales less than one 
minute (the respective frequencies are typically filtered 
out raj). The short-time variability has been described 
only by the so called linear measures, such as PNN50 (the 
probability that two adjacent normal heart beat intervals 
differ more than 50 milliseconds). Meanwhile, the level of 
the short-time variability of the human heart rate varies 
in a very complex manner, the high- and low- variability 
periods are deeply intertwined Q . This is a very impor- 
tant aspect, because the low- variability periods are the 
periods when the heart is in a stressed state, with high 
level of signals arriving from the autonomous nervous 
system. The conventional linear measures are not ap- 
propriate for describing such a complex behavior. Thus, 
there is a clear need for suitable nonlinear methods. 

In this paper we present a new scale-invariant descrip- 
tion of the short-time variability of the heart rate. It is 
shown that the distribution of low-variability periods in 
the activity of a normal heart follows the Zipf's law. It 
is also shown that the distribution function of the low- 
variability periods contains a considerable amount of di- 
agnostically valuable information. This new technique 
is also applicable to other non-linear time-series, such as 
EEG signals and financial data || . 

Our analysis is based on ambulatory Holter-monitoring 
data (recorded at Tallinn Diagnostic Centre) of 218 pa- 
tients with various diagnoses. The groups of patients 
are shown in Table 1. The sampling rate of ECG was 
180 Hz. The patients were monitored during 24 hour un- 
der normal daily activities. The preliminary analysis of 
the ECG recordings was performed using the commercial 
software; this resulted in the sequence of the normal-to- 
normal (NN) intervals ijv./v (measured in milliseconds), 
which are defined as the intervals between two subsequent 
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normal heartbeats (i.e. normal QRS complexes). 





Healthy 


IHD 


SND 


VES 


PCI 


RR 


FSK 


No. of patients 


103 


8 


11 


16 


7 


11 


6 


Mean age 


45.5 


65.4 


50.0 


55.9 


47.3 


55.5 


11.7 


Std. dev. of age 


20.5 


11.4 


19.3 


14.3 


11.6 


14.4 


4.6 



TABLE I: Test groups of patients. Abbreviations are as 
follows: IHD - Ischemic Heart Disease (Stenocardia); SND - 
Sinus Node Disease; VES - Ventricular Extrasystole; PCI - 
Post Cardiac Infarction; RR - Blood Pressure Disease; FSK - 
Functional Disease of Sinus Node. 



Originally, the Zipf 's law addressed the distribution of 
words in a language ]l2| : every word has assigned a rank, 
according to its "size" /, defined as the relative number 
of occurrences in some long text (the most frequent word 
obtains rank r = 1, the second frequent — r = 2, etc). 
The empirical size-rank distribution law /(r) cx r~ a is 
surprisingly universal: in addition to all the tested natu- 
ral languages, it applies also to many other phenomena. 
The scaling exponent a is often close to one (e.g. for the 
distribution of words). Typically, the Zipf's law is ap- 
plicable to a dynamical system at statistical equilibrium, 
when the following conditions are satisfied: (a) the sys- 
tem consists of elements of different size; (b) the element 
size has upper and lower bounds; (c) there is no inter- 
mediate intrinsic size for the elements. As already men- 
tioned, the human heart rhythm has a complex structure, 
where the duration r of the low-variability periods varies 
in a wide range of scales, from few to several hundreds of 
heart beats. Thus, one can expect that the distribution 
of the low-variability periods follows the Zipf's law 

r cx t -7 . (1) 

However, the scaling behavior should not be expected to 
be perfect. Indeed, the heart rate is a non-stationary sig- 
nal affected by the non-reproducible daily activities of the 
subjects. The non-stationary pattern of these activities, 
together with their time-scales, is directly reflected in 
the above mentioned distribution law. This distribution 
law can also have a fingerprint of the characteristic time- 
scale (around ten to twenty seconds) of the blood pres- 
sure oscillations. Finally, there is a generic reason why 
the Zipf's law fails (or is non-perfect) at small rank num- 
bers. The Zipf's law is a statistical law; meanwhile, each 
rank- length curve is based on a single measurement. Par- 
ticularly, there is only one longest low-variability period 
(and likewise, only one most-frequent word), the length 
of which is just as long as it happens to be, there is no av- 
eraging whatsoever. For large rank values r, the relative 
statistical uncertainty can be estimated as 1/ ' \fr. 

To begin with, we define the local variability for each 
(«-th) interbeat interval as the deviation of the heart rate 
from the local average, 

xf -s \tNN(i) ~ (t N N(i)) I fn s 



The angular braces denote the local average, calculated 
using a narrow (5 beats wide) Gaussian weight function. 
Further, we introduce a threshold value So', i-th interbeat 
interval is said to have a low variability, if the condition 

S(i) < So (3) 

is satisfied. A low-variability period is defined as a set of 
consecutive low-variability intervals; its length r is mea- 
sured in the number of heartbeats. Finally, all the low- 
variability periods are arranged according to their lengths 
and associated with ranks. The rank of a period is plot- 
ted versus its length in a logarithmic graph, see Fig. 1; 
Zipf's law would correspond to a straight descending line. 
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FIG. 1: Multi-scaling distribution of the low-variability peri- 
ods: the rank r of a period is plotted versus its duration r 
(measured in heartbeats) for different values of the threshold 
parameter So- 

For a very low threshold parameter So, all the low- 
variability periods are very short, because it is difficult 
to satisfy the stringent condition (3). In that case, the 
inertial range of scales is too short for a meaningful scal- 
ing law. On the other hand, for a very high value of 
So, there is a single low- variability period occupying the 
entire HRV-recording. Between these two cases, there is 
such a range of the values of So, which leads to a non- 
trivial rank-length law. For a typical healthy patient, the 
r(r)-curve is reasonably close to a straight line, and the 
scaling exponent 7 is a function of the threshold parame- 
ter So- Thus, unlike all the other well-known applications 
of the Zipf's law, we are dealing with a multi-scaling law. 

Recently, Ivanov et. al. have reported that anoma- 
lous multifractal spectra of the HRV signal indicate an 
increased risk of sudden cardiac death. Therefore, it is 
natural to ask, does the presence or failure of the multi- 
scaling behavior indicate the healthiness of the patient? 
In what follows we discuss a somewhat more general ques- 
tion: what is the relationship between the properties of 
the distribution function of the low variability periods 
and the diagnosis of the patient. Testing the prognostic 
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significance for predicting sudden cardiac death, which is 
also of a great importance, has been postponed due to 
the nature of our test groups. 

First, let us analyze the correlation between the diag- 
nosis of a patient and the scaling exponent 7. To begin 
with, we have to determine the optimal value for the 
threshold parameter 8q. For a meaningful analysis, the 
scaling behavior should be as good as possible. It turned 
out that for a typical patient, the best approximation 
of the function r(r) with a power law is achieved for 
Sq w 0.05 (see Fig. 2a); in what follows, all the values of 
the exponent 7 are calculated for Sq = 0.05. It should be 
noted that for some patients, the length-rank distribution 
is still far from a power law (see Fig. 2b). 

The slope of a curve on the logarithmic plot is calcu- 
lated using root-mean-square (rms) fit for such a range 
of lengths [r s t ar t, ''end], for which the r(r)-curve is nearly 
a power law, and the scaling range width A = lnr cn d — 
liiT s tait is as large as possible. Bearing in mind the statis- 
tical nature of the Zipf 's law and non-stationarity of the 
underlying signal, we have chosen a not very stringent 
definition of what is "nearly a power law", see Fig. 3. 
Around the rms-fit-line, two limit lines are drawn; r sta rt 
and Tend correspond to the points, where the r(r)-curve 
crosses the limit lines. 

Note that the precise placement and shape of the limit 
lines is arbitrary, i.e. small variations do not lead to qual- 
itative effects. Here, the distance of the limit lines from 
the central line has been chosen to be In 2 at t = r max , 
and zero at r = 1, where r max is the length of the 
longest low- variability period. Admitting mismatch In 2 
at r = r max is motivated by the observation that due to 
the lack of any statistics, the longest low-variability pe- 
riod could have been easily twice as long as we measured 
it to be. For large rank values, the statistical uncer- 
tainty is assessed as \fr\ in logarithmic graph, this would 
correspond to an exponentially decreasing (with increas- 
ing r) distance between a limit curve and the central 
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FIG. 2: Rank- length curves for a patient with a good power 
law (a) and for a patient with no power law (b). In both 
cases, the threshold parameter So = 0.05. 



fit-line. However, the above mentioned effect of the non- 
stationary pattern of the subjects daily activities makes 
the situation more complicated. There is no easy way 
to quantify this effect and therefore, we opted for the 
simplest possible solution, simple straight limit lines. 

The scaling exponent 7 has been calculated for all the 
patients and Student test was applied to every pair of 
groups. In most cases, the significance was quite low; 
two best distinguishable groups were RR and FSK, the 
result of Student test being 5.7%. Therefore, one can 
argue that the slopes of linear parts are highly personal 
characteristics depending also on the daily habits of the 
subjects, which are weakly correlated with diagnosis. 

Further we tested, how is the failure of the power law 
correlated with the diagnosis. The width of the scaling 
range A was used as a measure of how well the curve 
is corresponding to a power law. The Student test re- 
sults for the parameter A turned out to be similar to 
what has been observed for the parameter 7: the corre- 
lation between the failure of the power law and diagnosis 
was weak. Thus, a rank-length curve resembling the one 
depicted by a dashed line in Fig. 2, does not hint to 
heart pathology. It should be also noted that the dashed 
curve in Fig. 2 can be considered as a generalized form 
of scale- invariance with scale-dependent differential scal- 
ing exponent. Such a behavior seems to be universal; for 
instance, certain forest fire models lead to the differ- 
ential fractal dimension depending on the local scale. 

Finally, we analyzed the diagnostic significance of the 
parameters In r on d and In r sta rt ■ This analysis does make 
sense, because typically, the start- and end-points of the 
scaling range correspond to certain physiological time- 
scales. The parameter ln-r cn d provided, indeed, a remark- 
able resolution between the groups of patients, see Table 
2. According to the Student test, the healthy patients, 
were distinct from five heart pathology groups with prob- 
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FIG. 3: Definition of the width of the scaling interval A. The 
rank-length curve is fitted with a power law; the boundaries 
of the scaling interval are defined as the intersection points of 
limit lines and r(r)-curve. 
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p(%) 


Healthy 


IHD 


SND 


VES 


PCI 


RR 


FSK 


Healthy 




0.06 


17.21 


0.02 


0.07 


1.59 


1.55 


1HD 


0.36 




2.85 


96.79 


97.62 


21.93 


20.05 


SND 


2.99 


59.10 




2.10 


3.04 


25.77 


25.57 


VES 


0.08 


91.60 


63.79 




94.18 


17.59 


16.20 


PCI 


25.27 


21.61 


46.37 


22.89 




22.50 


20.62 


RR 


0.14 


73.57 


77.69 


80.49 


28.90 




98.20 


FSK 


46.48 


5.20 


8.72 


5.52 


20.06 


6.45 




Healthy 


£^ 


1.27 


43.12 


0.01 


6.27 


87.40 


73.99 


IHD 


4.82 




4.87 


90.04 


27.13 


6.11 


5.83 


SND 


47.91 


6.37 




3.81 


12.31 


55.50 


63.46 


VES 


0.34 


81.67 


6.02 




11.04 


3.69 


4.43 


PCI 


38.38 


18.24 


27.25 


12.40 




20.45 


17.37 


RR 


85.74 


6.80 


59.23 


4.01 


42.81 




88.81 


FSK 


65.87 


9.45 


81.38 


9.53 


38.30 


77.74 





TABLE II: p-values of the Student test. Data in the top- 
most triangular region (with label A) are calculated using the 
parameter lnr on( j. Triangular region B corresponds to the 
parameter lnr max , region C — to lnrioo, and region D — to 
lnr4o- Gray background highlights small p- values, p < 10%. 
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FIG. 4: Definition of the parameters r$o, rioo, and r2oo- 



ability p < 1.6%. The parameter lnr sta rt was diagnosti- 
cally less significant. 

Unfortunately, the calculation of the parameter r en d is 
technically quite a complicated task, not suited for clini- 
cal practice. Therefore, we aimed to find a simpler alter- 
native to it. Basically, the strategy was to find a simple 
parameter reflecting the behavior of the rightmost (large- 



t) part of the r(r)-curve. An easy option is lnT max , which 
has been already analyzed |11 . This parameter has in- 
deed a considerable diagnostic value, but its reliability 
is decreased by the above discussed statistical fluctua- 
tions. Better alternatives are provided by (a) the overall 
number of low- variability periods r max (which is small, if 
there are lot of long low- variability periods); (b) the coor- 
dinates of specific points of the rank-length curve. Here 
we chose a set of critical ranks R = 10, 20 or 40, and 
determined the respective lengths tr so that r(ru) = R. 
We also fixed a set of critical length values, T = 50, 
100, or 200, and determined the respective rank numbers 
rx = r(T), see Fig. 4. Both techniques turned out to be 
of high diagnostic performance; illustrative p-values are 
given in Table 2. Parameters t\q and T20 performed less 
well than T40 (for instance, the p-values for the healthy 
and VES-subject groups were 0.60%, 0.58% and 0.34%, 
respectively), and are not presented in tabular data. Sim- 
ilarly, rioo turned out to be more efficient than r§o and 
^200 (the respective healthy and VES-group p-values be- 
ing 0.02%, 0.01%, and 0.09%). It also outperforms r 40 , 
but is sometimes less efficient than r max or r cnc i (see Table 
2). Hence, various heart pathologies seem to affect the 
heart rate dynamics at the time scale around 100 heart 
beats (one to two minutes). 

In conclusion, new aspect of non-linear time-series has 
been discovered, the scale-invariance of low-variability 
periods. We have shown that the distribution of low vari- 
ability periods in the activity of human heart rate typi- 
cally follows a multi-scaling Zipf's law. The presence or 
failure of a power law, as well as the values of the scaling 
exponents, are personal characteristics depending on the 
daily habits of the subjects. Meanwhile, the distribution 
function of the low-variability periods as a whole con- 
tains also a significant amount of diagnostically valuable 
information, the most part of which is reflected by the 
parameters rioo, r max , and r cn d, see Table 2. These quan- 
tities characterize the complex structure of HRV signal, 
where the low- and high variability periods are deeply 
intertwined, aspect which is not covered by the other 
methods of heart rate variability analysis (such as frac- 
tional Brownian motion based multifractal analysis). As 
a future development, it would be of great importance 
to analyze the prognostic value of the above mentioned 
parameters for patients with sudden cardiac death. 
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